home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-09-04 | 109.6 KB | 2,131 lines |
- CHAPTER 3
-
- PROGRAMMING METHODS
-
- In Chapters 1 and 2 you learned how the BASIC compiler translates a source
- file into the equivalent assembly language statements, and how it allocates
- memory to store variables and constants. In particular, you saw that the
- BC compiler generates assembly language code directly for some statements,
- while for others it creates calls to routines in the BASIC libraries. Most
- of the code examples presented in that chapter dealt with simple variable
- assignments and calculations.
- Of course, the compiler must do much more than merely assign and
- manipulate variables and other data. Equally important is controlling how
- your program operates, and determining which paths are to be taken as it
- progresses. In this chapter we will delve into the inner workings of
- control flow structures, with an eye toward writing programs that are as
- efficient as possible. As with the earlier chapters, this discussion
- includes numerous disassemblies of compiled BASIC code. Thus, you will see
- exactly what the compiler does, and how each control flow statement is
- handled.
- This chapter also discusses the design of both static and non-static
- subprograms and functions, and compares the relative merits of each method.
- Many programmers do not fully understand the term Static, and find the
- related subject of recursive subroutines especially difficult to grasp.
- BASIC supports four types of subroutines, and each will be described in
- this chapter: GOSUB routines, subprograms, DEF FN functions, and what I
- call "formal functions". YOu will notice that I use the terms subroutine
- and procedure interchangeably, to indicate a single block of code that may
- be executed more than once. You will also learn how parameters are passed
- to these procedures.
- Finally, in this chapter I will discuss programming style. Programming
- in any language is arguably as much of an art as it is a science. But
- unlike, say, music, where a composer can write any sequence of notes and
- proclaim them acceptable, a computer program must at least work correctly.
- There are an infinite number of ways to accomplish any programming task,
- and I can make recommendations only. Which approach you choose will
- reflect both your own personal taste and style, as well as your current
- level of competence and understanding of programming in general.
-
-
- CONTROL FLOW
- ============
-
- All programs--regardless of the language in which they are written--require
- a mechanism for testing certain conditions and then performing different
- actions based on those conditions. Although there are many ways to perform
- tests and branches in a BASIC program, all of them do essentially the same
- thing. The BASIC control flow statements are GOTO, DO/LOOP, WHILE/WEND,
- IF/THEN/ELSE, FOR/NEXT, SELECT CASE, ON GOTO, and ON GOSUB. Because the
- capabilities of WHILE/WEND are also available with a DO/LOOP construct, the
- two will be discussed together.
- In almost all cases, the BASIC compiler directly generates the code that
- controls a program's flow. One exception is when floating point values are
- used as a FOR counter, or as a WHILE or UNTIL condition. In those
- situations, calls are made to the floating point comparison routines in the
- BASIC runtime library. Another place is when you have a statement such as
- CASE ASC(X$), or IF LEFT$(X$, 10) = Y$. ASC and LEFT$ are also subroutines
- in the BASIC language library, and they too are invoked by calls.
- It is important to reiterate that when dealing with integer test
- conditions, BC will in many cases create assembly language code that is as
- good as a human programmer would write. In the short program fragment that
- follows, all of the BASIC source code is shown translated to the equivalent
- assembly language statements. This listing was derived by compiling and
- linking the BASIC program for Microsoft CodeView, and then using CodeView
- to display the resultant code.
-
-
- This is what you write:
-
- DO
- X% = X% + 1
- LOOP WHILE X% < 100
-
-
- This is the result after compilation:
-
- 30:
- INC WORD PTR [X%] ;X% = X% + 1
- CMP WORD PTR [X%],64 ;compare X% to 100
- JL 30 ;jump if less to 30
-
-
- Here the variable X% is incremented, and then compared to the value 100.
- (64 is the Hex equivalent to 100, which is how CodeView displays values.)
- If X% is indeed less than 100, the program jumps back to address 30 and
- continues processing the loop. Notice that while this example does not use
- a named label in the BASIC source code as the target for a GOTO, the
- equivalent assembly language code does. In this case, the label is the
- code at address 30. Do not confuse the addresses that assembly language
- must use as jump targets with the numbered labels that in BASIC are
- optional.
-
-
- THE DREADED GOTO
-
- Modern programming philosophy dictates that GOTO and GOSUB statements
- should be avoided at all cost, in favor of DO and WHILE loops. However,
- all of these methods result in nearly identical code. Indeed, there is
- nothing inherently wrong with using GOTO when circumstances warrant it.
- By examining the program listing below, you will see that BASIC generates
- code that is identical for a GOTO as for a DO loop.
-
-
- This is what you write:
-
- Label:
- X% = X% + 1
- IF X% < 100 THEN GOTO Label
-
-
- This is the result after compilation:
-
- 30:
- INC WORD PTR [X%] ;X% = X% + 1
- CMP WORD PTR [X%],64 ;compare X% to 100
- JL 30 ;jump if less to 30
-
-
- Since GOTO and DO/LOOP produce the same results, which one is better, and
- why? In general, a DO/LOOP is preferable for two reasons. First, it is
- a nuisance to have to create a new and unique label name for every location
- that a program may need to branch to. Admittedly, in a short program this
- will not be a problem. But in a large application with many small loops
- that test for keyboard input, you end up creating many labels with names
- such as GetKey1, GetKey2, and so forth. And if you inadvertently use the
- wrong label name, your program will not work correctly.
- More important, however, is that for each label you define in a program,
- the BC compiler must remember its name and the equivalent address in the
- object code that the label identifies. Since label names can be as long
- as 40 characters and memory addresses require 2 bytes each to identify, a
- finite number of label names can be accommodated. By avoiding unnecessary
- labels, you are giving BC that much more memory to use for compiling your
- program.
- There are several situations in which GOTO is preferable to a DO or
- WHILE loop. Indeed, one of my personal pet peeves is when a programmer
- tries to shoehorn structure into a program no matter what the cost.
- Consider the three different code fragments below; each waits for a key
- press and then assigns it to the variable Ky$.
-
-
- This approach is the worst:
-
- Ky$ = ""
- WHILE Ky$ = ""
- Ky$ = INKEY$
- WEND
-
-
- This method is better:
-
- Label:
- Ky$ = INKEY$
- IF Ky$ = "" GOTO Label
-
-
- And this is better still:
-
- DO
- Ky$ = INKEY$
- LOOP WHILE Ky$ = ""
-
-
- In the first example, an extra step is needed solely to clear Ky$ to a null
- string, so the initial WHILE will be true and execute at least once. Every
- string assignment adds 13 bytes to a program, and those 13 bytes can add
- up quickly in a large application.
- The second example avoids the unnecessary assignment, but adds a label
- for GOTO to jump to. Although this label does require a small amount of
- additional memory while the program is being compiled, it does not increase
- the size of the final executable program file.
- The last example is better still, because it avoids the need for a line
- label and also avoids an extra string assignment. Since a DO loop allows
- the test to be placed at either the top or bottom of the loop, you can
- force the loop to be executed at least once by putting the test at the
- bottom as shown here.
- However, even this can be improved upon by eliminating the string
- comparison that checks if Ky$ is equal to a null string. If we replace
- LOOP WHILE Ky$ = "" with LOOP UNTIL LEN(Ky$), only 13 bytes of code are
- generated instead of 15. When two strings are compared (Ky$ and ""), each
- must be passed to the string comparison routine. Since LEN requires only
- one argument, the code to pass the second parameter is avoided.
- There are some situations for which the GOTO is ideally suited. In the
- first two examples below, a complex expression is used as the condition for
- executing a DO WHILE loop, and the same expression is then used again
- within the loop.
-
-
- DO WHILE (X% + Y%) * Z% > 13
- IF (X% + Y%) * Z% = 100 THEN PRINT
- ...
- ...
- LOOP
-
-
- DO WHILE ASC(MID$(S$, A%, B%)) > 13
- IF ASC(MID$(S$, A%, B%)) > 100 THEN PRINT
- ...
- ...
- LOOP
-
-
- Label:
- Temp% = ASC(MID$(S$, A%, B%))
- IF Temp% > 13 THEN
- IF Temp% > 100 THEN PRINT
- ...
- ...
- GOTO Label
- END IF
-
-
- In the first example, BASIC remembers the results of its test that checks
- if a (X% + Y%) * Z% is greater than 13, and it uses the result it just
- calculated in the next test that compares the same expression to 100. This
- is one more example of the kinds of optimizations BC performs as it
- compiles your programs. String expressions such as those used in the
- second example are of necessity more complex, and require calls to library
- routines. With this added complexity, BASIC unfortunately cannot retain
- the result of the earlier comparison, and it generates identical code a
- second time.
- A more elegant solution in this case is therefore the GOTO as shown in
- the last example. Because the result of evaluating the expression is saved
- manually, it may be reused within the loop. As proof, the second DO WHILE
- example above requires 73 bytes to implement, as opposed to only 53 when
- Temp% and GOTO are used.
- I should also point out that the most common and valuable use for GOTO
- is to get out of a deeply nested series of IF or other blocks of code. It
- is not uncommon to have a FOR/NEXT loop that contains a SELECT CASE block,
- and within that a series of IF/ELSE tests. The only way to jump out of all
- three levels at once is with a GOTO.
-
-
- FOR/NEXT LOOPS
-
- Unlike WHILE and DO loops that can test for nearly any condition and at
- either the top or bottom of the loop, a FOR/NEXT loop is intended to
- perform a block of statements a fixed number of times. A FOR/NEXT loop
- could also be replaced with code that compares a value and uses GOTO to
- reenter the loop if needed, but that is hardly necessary. My point is to
- yet again illustrate that all of BASIC's seemingly fancy constructs are no
- more than tests and GOTOs deep down at the assembly language level.
- A FOR/NEXT loop determines the number of iterations that will be
- executed once ahead of time, before the loop begins. For example, the
- listing below shows a loop that changes the upper limit inside the loop.
- However the loop still executes 10 times.
-
-
- Limit% = 10
- FOR X% = 1 TO Limit%
- Limit% = 5
- PRINT Limit%
- NEXT
-
-
- The code that BASIC produces for the FOR/NEXT loop in the previous example
- is translated to the following equivalent during the compilation process.
-
-
- Limit% = 10
- Temp% = Limit%
- X% = 1
- GOTO Next:
- For:
- Limit% = 5
- PRINT Limit%
- X% = X% + 1
- Next:
- IF X% <= Temp% THEN GOTO For
-
-
- Please understand that changing a loop condition inside the loop is
- considered bad practice, because the program becomes difficult to
- understand. If you really need to alter the limit inside a loop, the loop
- should be recoded to use WHILE or DO instead. Another good reason for
- avoiding such code is because it is possible that future versions of BASIC
- will behave differently than the one you are using now. If Microsoft were
- to modify BASIC such that the limit condition were reevaluated at the NEXT
- statement, your code would no longer work. It is also considered bad
- practice to modify the loop counter variable itself (X% in the previous
- examples). However, this causes no real harm, and you should not be afraid
- to do that if the situation warrants it. Of course, changing the loop
- counter will affect the number of times the loop is executed.
-
-
- IF/THEN/ELSE AND SELECT CASE
-
- BASIC provides two methods for testing conditions in a program, and
- executing different blocks of code based on the result. The most common
- method is the IF test, which can be used on a single variable, the result
- of an expression, the returned value from a function, or any combination
- of these. I won't belabor the most common uses for IF here, but I do want
- to point out some of its less obvious properties. Also, there are some
- situations where IF and ELSEIF are appropriate, and others where their
- counterpart, SELECT CASE, is better.
- As you have already learned, a simple IF test will in most cases be
- translated into the equivalent assembler instructions directly. In some
- cases, however, the condition you specify is tested, while in others the
- *opposite* condition is tested. If you say IF X > 10 THEN GOTO Label,
- BASIC may change that to IF X <= 10 GOTO [next statement]. Which BASIC
- uses depends on what you will do if the condition is true, and how far away
- in the generated code the statements that will be executed are located.
- When a GOTO is to be performed if the test passes, then the relative
- position of the target label is also a factor.
- A jump to a location either ahead in the code or more than 128 bytes
- backwards requires BASIC to generate more code. The 128 byte displacement
- is significant, because the 80x86 can perform a *conditional jump* to an
- address only a limited distance away. That is, after a comparison is made,
- the target address for a conditional jump such as "Jump if Greater" must
- be no more than that many bytes distant. However, an unconditional jump
- can be to any address within the same 64K code segment. (Bear with me for
- a moment, because the significance of this will soon become apparent.)
- This is shown in the next listing following.
-
-
- IF X% = 100 THEN
- CMP Word Ptr [X%],64 ;compare X% to 100
- JE 003A ;jump ahead if equal
- JMP Label ;else, skip ahead
- 003A: ;BASIC made this label
- Y% = 2
- MOV Word Ptr [Y%],2
- END IF
-
- Label:
- IF X > 8 GOTO Label
- CMP Word Ptr [X%],8 ;compare X% to 8
- JG Label ;jump back if greater
-
-
- In the first example above, BASIC compares the value of X% to 100 (64 Hex),
- and if equal jumps ahead to a label it created at address 003A Hex.
- Otherwise, a jump is made to the next statement in the program, which in
- this case is a named label. Although using two jumps may seem
- unnecessarily convoluted, it is necessary because BASIC has no way of
- knowing how many statements will follow at the time it compiles the IF
- test. Thus, it also cannot know whether the statement following the END
- IF will end up being 128 or more bytes ahead.
- By jumping to another, unconditional jump, BC is assured that the
- generated code will be legal. (When BC finally encounters the END IF, it
- goes back to the code it created earlier, and completes the portion of the
- unconditional jump instruction that tells how far to go.) Some compilers
- avoid this situation and create the longer, two-jump code on a trial basis,
- but then go back and change it to the shorter form if possible. These are
- called two-pass compilers, because they process your source code in two
- phases. Unfortunately, current versions of Microsoft BASIC do not use more
- than one pass.
- In the second example Label has already been encountered, and BC knows
- that the label is within 128 bytes. Therefore, it can translate the IF
- statement directly, without having to conditionally jump to yet another
- jump. Had the earlier label been farther away, though, an extra jump would
- have been needed. It is important to understand that forward jumps are
- always handled with more code than is likely necessary, because BASIC does
- not know how far ahead the jump must go. In fact, this same issue must be
- dealt with when writing in assembly language, since the conditional jump
- distance limitation is inherent in the 80x86 microprocessor.
- The bottom line, therefore, is that you can in many cases reduce the
- size of your programs by controlling in which direction a conditional jump
- will be performed. For example, almost all programs must at some point sit
- in a loop waiting until a key is pressed. The next listing shows two
- common ways to do this, with one testing for a key press at the top of the
- loop, and the other doing the test at the bottom.
-
-
- DO UNTIL LEN(INKEY$) ;this comprises 18 bytes
- 0030:
- CALL B$INKY ;call INKEY$
- PUSH AX ;pass the result to LEN
- CALL B$FLEN ;AX now holds the length
- AND AX,AX ;see if it's zero
- JZ 0042 ;yes, jump to LOOP
- JMP 0044 ;no, jump out of loop
- 0042:
- LOOP
- JMP 0030 ;jump back to DO
-
-
- 0044:
- DO ;this is only 15 bytes
- LOOP UNTIL LEN(INKEY$)
- CALL B$INKY ;call INKEY$
- PUSH AX ;as above
- CALL B$FLEN
- AND AX,AX
- JZ 0044 ;jump back if zero
-
-
- Viewed from a purely BASIC perspective, these two examples operate
- identically. But as you can see, the code that BASIC creates is more
- efficient for the second example. When BASIC encounters the first DO
- statement, it has no idea how many more statements there will be until the
- terminating LOOP. Therefore, it has no recourse but to create an extra
- jump. In the second example, the location of the DO is already known to
- be within 128 bytes, so the LOOP test can branch back using the shorter and
- more direct method.
- An ELSEIF statement block is handled in a similar fashion, with code
- that directly compares each condition and branches accordingly. Because
- the code to be executed if the IF is true is always after the IF test
- itself, the less efficient two-jump code must be generated. A simple
- IF/ELSEIF follows, shown as a mix of BASIC and assembly language
- statements.
-
-
- IF X% > 9 THEN
- CMP Word Ptr [X%],9 ;compare X% to 9
- JG 003A ;assign Y% if greater
- JMP 0043 ;else jump to next test
- 003A:
- Y% = 1
- MOV Word Ptr [Y%],1 ;assign Y%
- JMP 0066 ;jump out of the block
- ELSEIF X% > 5 THEN
- 0043:
- CMP Word Ptr [X%],5 ;as above
- JG 004D
- JMP 0066
- 004D:
- Y% = 2
- MOV Word Ptr [Y%],2
- END IF
- 0066:
- ...
- ...
-
-
- Aside from the additional jumping over jumps that are added to all forward
- address references, this code is translated quite efficiently. In this
- situation, the compiled output is identical to that produced had SELECT
- CASE been used. However, there is one important situation in which SELECT
- CASE is more efficient than IF and ELSEIF.
- For each ELSEIF test condition, code is generated to create a separate
- comparison. When a simple comparison such as X% > 9 is being made, only
- one assembly language statement is needed. But when an expression is
- tested--for example, ABS((X% + Y%) * Z%)) > 9--identical code is generated
- repeatedly. This is illustrated in the listing that follows.
-
-
- IF ABS((X% + Y%) * Z%) = 5 THEN
- A% = 1
- ELSEIF ABS((X% + Y%) * Z%) = 6 THEN
- A% = 2
- ELSEIF ABS((X% + Y%) * Z%) = 7 THEN
- A% = 3
- END IF
-
-
- Each time BC encounters the expression ABS((X% + Y%) * Z%), it duplicates
- the same assembly language statements. But when SELECT CASE is used, the
- expression is evaluated once, and used for each subsequent test. The first
- example in the next listing shows how SELECT CASE could be used to provide
- the same functionality as the preceding IF/ELSEIF block, but with much less
- code. The second example then shows what SELECT CASE really does, using
- an IF/ELSEIF equivalent.
-
-
- You write it this way:
-
- SELECT CASE ABS((X% + Y%) * Z%)
- CASE 5: A% = 1
- CASE 6: A% = 2
- CASE 7: A% = 3
- CASE ELSE
- END SELECT
-
-
- BASIC really does this:
-
- Temp% = ABS((X% + Y%) * Z%)
- IF Temp% = 5 THEN
- A% = 1
- ELSEIF Temp% = 6 THEN
- A% = 2
- ELSEIF Temp% = 7
- A% = 3
- END IF
-
-
- As you can see, SELECT CASE evaluates the expression once, stores the
- result in a temporary variable, and then uses that variable repeatedly for
- all subsequent comparisons. Therefore, when the same expression is to be
- tested multiple times, SELECT CASE will be more efficient than IF and
- ELSEIF. This is also true for string expressions and other functions. For
- example, SELECT CASE LEFT$(Work$, 10) will result in less code and faster
- performance than using IF and ELSEIF with that same expression more than
- once.
- Another important feature of SELECT CASE is its ability to use either
- variable or constant test conditions, and to operate on a range of values.
- For example, the C language Switch statement which is the equivalent of
- BASIC's SELECT CASE can use only constant numbers for each test. BASIC is
- particularly powerful in this regard, and allows any legal expression for
- each CASE condition. For example, CASE IS > (Y AND Z) is valid, and so is
- CASE 0 TO Max. CASE also accepts multiple conditions separated by commas
- such as CASE 1, 3, 4 TO 100, -10 TO -1. In this case, the statements that
- follow will be executed if the selected expression equals 1, 3, any value
- between 4 and 100 inclusive, or any value between -10 and -1 inclusive.
- It is also worth mentioning here that QuickBASIC version 4.0 contains
- an interesting and irritating quirk that requires a CASE ELSE in the event
- that none of the tests match. Had the CASE ELSE been omitted from the
- previous example and the value of the expression was not between 5 and 7,
- QuickBASIC 4.0 would issue a "CASE ELSE expected" error at run time.
- Fortunately, this has been repaired in QuickBASIC 4.5 and later versions.
- Notice that this is not a bug in QuickBASIC. Rather, it is the behavior
- described in the ANSI (American National Standards Institute) specification
- for BASIC. At the time QuickBASIC 4.0 was introduced, Microsoft mistakenly
- believed the then-proposed ANSI standard for BASIC would be significant.
- As that standard approached fruition, it became clear to Microsoft that the
- only standard most programmers really cared about was Microsoft's.
- One final point I cannot make often enough is the inherent efficiency
- of integer operations and comparisons. This is especially true in the
- comparisons that are made in both IF and CASE tests. In the first example
- below, each of the characters in a string is tested in turn. The second
- example shows a much better way to write such a test, by obtaining the
- ASCII value once and using that for subsequent integer comparisons.
-
-
- Not recommended:
-
- FOR X = 1 TO LEN(Work$)
- SELECT CASE MID$(Work$, X, 1)
- CASE CHR$(9): PRINT "Tab key"
- CASE CHR$(13): PRINT "Enter key"
- CASE CHR$(27): PRINT "Escape key"
- CASE "A" TO "Z", "a" TO "z": PRINT "Letter"
- CASE "0" TO "9": PRINT "Number"
- END SELECT
- NEXT
-
-
- Much more efficient:
-
- FOR X = 1 TO LEN(Work$)
- SELECT CASE ASC(MID$(Work$, X, 1))
- CASE 9: PRINT "Tab key"
- CASE 13: PRINT "Enter key"
- CASE 27: PRINT "Escape key"
- CASE 65 TO 90, 97 TO 122: PRINT "Letter"
- CASE 48 TO 57: PRINT "Number"
- END SELECT
- NEXT
-
-
- In the first program the SELECT itself generates 27 bytes, which is
- comprised of a call to the MID$ function and then a call to the string
- assign routine. A string assignment is needed to save the MID$ result in
- a temporary variable for the subsequent tests that follow. Each CASE test
- that uses CHR$ adds 27 bytes, and this includes the call to CHR$ as well
- as an additional call to the string comparison routine. Testing for the
- letters adds 75 bytes, and testing for the numbers adds 39 more. This
- results in a total code size of 222 bytes, not counting the FOR/NEXT loop.
- Contrast that with only 131 bytes for the second example, in which the
- SELECT portion requires only 26 bytes. Although an extra call is needed
- to obtain the ASCII value of the extracted character, the lack of a
- subsequent string assignment more than makes up for that. Further, the
- tests for 9, 13, and 27 require only 13 bytes each, compared to 27 when
- CHR$ values were used. The letters test requires 43 bytes, and the numbers
- test only 23.
- Clearly this is a significant improvement, especially in light of the
- small number of tests that are being performed here. In a real program
- that performs hundreds of string comparisons, replacing those with integer
- comparisons where appropriate will yield a substantial size reduction.
-
-
- AND, OR, EQV, and XOR
-
- When you use AND or OR in an IF test, what is really being compared is
- either 0 or -1. That is, BASIC evaluates the *truth* of each expression
- being tested on both sides of the AND or OR, and a truth in BASIC always
- results in one or the other of these values. Once each expression has been
- evaluated, the results are combined using an assembly language AND or OR
- instruction, and a branch is then made accordingly. Remember that when
- integers are treated as unsigned, setting all of the bits to 1 results in
- a value of -1.
- In chapter 2 I showed how the various logical operators are used to
- manipulate bits in an integer or long integer variable. The concept is
- identical when these operators are used for decision-making in a BASIC
- program. The difference is really more a matter of semantics than
- definition. That is, the same bit manipulation is performed, only in this
- case on the result of the truth of a BASIC expression. This is shown in
- context below, where two test expressions are combined using AND.
-
-
- IF X > 1 AND Y < 2 THEN
- CMP Word Ptr [X%],1 ;compare X% to 1
- MOV AX,0 ;assume False
- JLE 003B ;we assumed correctly
- DEC AX ;wrong, decrement to -1
- 003B:
- CMP Word Ptr [Y%],2 ;now compare Y% to 2
- MOV CX,0000 ;assume False
- JGE 0046 ;we assumed correctly
- DEC CX ;wrong, decrement to -1
- 0046:
- AND CX,AX ;combine the results
- AND CX,CX ;(this is redundant)
- JNZ 004F ;if not 0 assign Z%
- JMP 0055 ;else jump past END IF
- Z = 3
- 004F:
- MOV Word Ptr [Z%],3 ;assign Z%
- END IF
- 0055:
- ...
- ...
-
-
- The result of the first comparison is saved in the AX register as either
- 0 or -1, and the second is saved in CX using similar code. Once both tests
- have been performed and AX and CX are holding the appropriate values, the
- registers are then tested against each other using AND. The instruction
- AND CX,AX not only combines the results, but it also sets the CPU's Zero
- Flag to indicate if the result was zero or not. Therefore, the second test
- that uses AND to compare CX against itself to check for a zero result is
- redundant. At only 2 additional bytes, the impact on a program's size is
- not terribly significant. However, this shows first-hand the difference
- between code written by a compiler and code written by a person.
- OR conditions are handled similarly, except the assembly language OR
- instruction is used instead of AND. When multiple conditions are being
- tested using combinations of AND and OR and perhaps nested parentheses as
- well, additional similar code is employed.
- There are many situations where all that is really necessary is to test
- for a zero or non-zero condition. For example, it is common to use an
- integer variable as a True/False "flag" which can be set in one part of a
- program, and tested in another. By understanding the underlying code that
- BASIC creates, you can help BASIC to reduce the size of your programs
- enormously. In particular, avoiding a comparison with an explicit value
- lets BASIC generate fewer comparison instructions. The listing below shows
- how you can test multiple flags using AND, but with much less resulting
- code than using an explicit comparison.
-
-
- IF Flag1% AND Flag2% THEN
- MOV AX,[Flag2%] ;move Flag2% into AX
- AND AX,[Flag1%] ;AND that with Flag1%
- AND AX,AX ;(this is redundant)
- JNZ 0063 ;if not zero assign Z%
- JMP 0069 ;else skip past END IF
- Z% = 3
- 0063:
- MOV Word Ptr [Z%],3
- END IF
- 0069:
- ...
- ...
-
-
- The key here is that zero is always used to represent False, and -1 to
- represent a True condition. That is, instead of writing IF Flag1% = -1 AND
- Flag2% = -1, using IF Flag1% AND Flag2% provides the same results. At only
- 20 bytes of generated code, this method is far superior to tests for an
- explicit -1 which require 37 bytes. If you recall, in Chapter 2 I showed
- how the various bits in a variable can be turned on or off with AND. Thus,
- 1111 AND 1111 equals 1111, while 1111 AND 0000 equals 0.
- Notice that using 0 and -1 has many other benefits as well. For
- example, the NOT operator which was also described in Chapter 2 can toggle
- a variable between those values. If all of the bits in a variable are
- presently zero, then NOT Variable% results in all ones (-1). This property
- can also be used to enhance a program's readability, by using NOT much like
- you would in an English sentence. For example, the code following the line
- IF NOT Flag% THEN will be executed if Flag% is 0 (False), but it will not
- be executed if Flag% is -1 (True).
- In fact, an explicit comparison is optional if you need to test only for
- a non-zero value. IF Variable <> 0 THEN can be reduced to IF Variable
- THEN, and the statements that follow will be executed as long as Variable
- is not 0. Notice that the only saving here is in the BASIC source, since
- either comparison creates ten bytes of assembler code. But when using long
- integers, the short form saves five bytes--14 bytes versus 19 for an
- explicit comparison to zero.
- NOT is equally valuable when toggling a flag variable between two
- values. If you have, say, an input routine that keeps track of the Insert
- key status, then you could use Insert% = NOT Insert% each time you detect
- that the Insert key was pressed. The first time the operator presses that
- Key, the Insert flag will be switched from the default start-up value of
- 0 to -1. Then using Insert% = NOT Insert% a second time will revert the
- bits back to all zeros. In fact, it is a common technique to define True
- and False variables (or constants) in a program using this:
-
- False% = 0
- True% = NOT False%
-
- Most programmers understand how to use parentheses to force a particular
- order of evaluation. By default, BASIC performs multiplication and
- division before it does addition and substraction. When operators of the
- same precedence are being used, then BASIC simply works from left to right.
- However, the order in which logical comparisons are made is not always
- obvious. This can become particularly tricky if you are using some of the
- shorthand methods I described earlier.
- For example, consider the statements IF X AND Y > 12, IF NOT X OR Y, and
- IF X AND Y OR Z. In the first example, the truth of the expression Y > 12
- is evaluated first, with a result of either 0 or -1. Then, that result is
- combined logically with the value of X using AND. The resulting order of
- evaluation is performed as if you had used IF X AND (Y > 12). The other
- expressions are evaluated as IF (NOT X) OR Y and IF (X AND Y) OR Z.
- The last logical operators we will consider are EQV and XOR. These are
- used rarely by most BASIC programmers, probably because they are not well
- understood. However, EQV can dramatically reduce the size of a program in
- certain circumstances. It is not uncommon to test if two conditions are
- the same, whether True or False. EQV stands for Equivalent, meaning it
- tests if the expressions are the same--either both true or both false. All
- three program fragments below serve the same purpose, however the first
- generates 57 bytes, while the second and third create only 16 bytes.
-
-
- IF (X = -1 AND Y = -1) OR (X = 0 AND Y = 0) THEN
- ...
- END IF
-
- IF X EQV Y THEN
- ...
- END IF
-
- IF NOT (X XOR Y) THEN
- ...
- END IF
-
-
- Although these examples could be replaced with a simple comparison that
- tests if X equals Y, EQV can reduce other, more elaborate AND and OR tests.
- For example, you could replace this:
-
- IF (X = 10 AND Y = 100) OR (X <> 10 AND Y <> 100)
-
- with this:
-
- IF X = 10 EQV Y = 100
-
- and gain a handsome reduction in code size. Notice that because of the way
- EQV works, the third example in the listing above results in identical
- assembly language code as the second. XOR is true only when the two
- conditions are different, thus NOT XOR is true when they are the same.
- One final point worth mentioning is that you can assign a variable based
- on the truth of one or more expressions. As you saw earlier, every IF test
- that is used in a BASIC program adds a minimum of 3 extra bytes for a
- second, unconditional jump. That additional code can be avoided in many
- cases by assigning a variable based on whether a particular condition is
- true or not. In the code examples that follow, both program fragments do
- the same thing, except the first requires 25 bytes compared to only 14 for
- the second.
-
-
- IF Variable = 20 THEN
- Flag = -1
- ELSE
- Flag = 0
- END IF
-
-
- Flag = (Variable = 20)
-
-
- In either case, the truth of the expression Variable = 20 must be
- evaluated. However, the IF method adds code to jump around to different
- addresses that assign either -1 or 0 to Flag. The second example simply
- assigns Flag directly from the 0 or -1 result of the truth test. Other
- variants on this type of programming are statements such as A = (B = C),
- and Flag = (LEN(Temp$) <> 0 AND Variable < 50). Note that the surrounding
- parentheses are shown here for clarity only, and BASIC produces the same
- results without them.
-
-
- Short Circuits
-
- There is one important point regarding AND testing you should be aware of.
- Although the code that BASIC creates to implement these logical tests is
- very efficient, in some cases a different approach can yield even better
- results. When many conditions are tested, QuickBASIC creates assembly
- language code to evaluate all of them before making a decision. This can
- be wasteful, because often one of the conditions will be false, negating
- a need to test the remaining conditions. For example, this statement:
-
- IF Any$ = "Quit" AND IntVar% > 100 AND Float! <> 0 THEN PRINT "True"
-
- requires that all three conditions be tested before the program can
- proceed. But if Any$ is not equal to "Quit", there is no need reason to
- spend time evaluating the other tests.
- The solution is to instead use nested IF tests, preferably placing the
- most likely (or simplest) tests first, as shown below.
-
-
- IF Any$ = "Quit" THEN
- IF IntVar% > 100 THEN
- IF Float! <> 0 THEN
- PRINT "True"
- END IF
- END IF
- END IF
-
-
- Here, if the first test fails, no additional time is wasted testing the
- remaining conditions. Further, using the nested IF tests with QuickBASIC
- also results in less code: 50 bytes versus 64. Note, however, that BASIC
- PDS [and VB/DOS] incorporate a technique known as *short circuit expression
- evaluation*, which generates slightly more efficient code when AND is used.
- With the newer compilers, each condition is tested in sequence, and the
- first one that fails causes the program to skip over the code that prints
- "True". But even with this improved code generation, you should still
- place the most likely tests first.
-
-
- ON GOTO AND ON GOSUB STATEMENTS
-
- The last non-procedural control flow statements I will discuss here--ON
- GOTO and ON GOSUB--are used infrequently by many BASIC programmers. But
- when you need to test many different values *and* those values are
- sequential, ON GOTO and ON GOSUB can reduce substantially the amount of
- code that BASIC generates. For clarity, I will use ON GOTO for most of the
- examples that follow. Both work in a similar fashion except with ON GOSUB,
- execution resumes at the next BASIC statement when the subroutine returns.
- You have already seen that IF/ELSEIF and SELECT CASE blocks are not as
- efficient as they could be, because the compiler does not know how far
- ahead the END IF or END SELECT statements are located. Therefore, no
- matter how trivial the IF or CASE tests being performed are, a pair of
- jumps is always created even when a single jump would be sufficient.
- Further, when many tests are necessary, there is no avoiding at least some
- amount of code for each comparison. This is where ON GOTO can help.
- Rather than perform a series of separate tests for each value being
- compared, ON GOTO uses a lookup table which is imbedded in the code
- segment. This table is merely a list of addresses to branch to, based on
- the value of the variable or expression being evaluated. If the value
- being tested is 1, then a branch is taken to the first label in the list.
- If it is 2, the code at the second label is executed, and so forth.
- As many as 60 labels can be listed in an ON GOTO statement, although the
- number being tested can range from 0 to 255. If the value is 0 or higher
- than the number of items in the list, the ON GOTO command is ignored, and
- execution resumes with the statement following the ON GOTO. Negative
- values or values higher than 255 cause an "Illegal function call" error.
- A simple example showing the basic syntax for ON GOTO is shown below.
-
-
- INPUT "Enter a value between 1 and 3: ", X
- ON X GOTO Label1, Label2, Label3
- PRINT "Illegal entry!"
- END
-
- Label1:
- PRINT "You pressed 1"
- END
-
- Label2:
- PRINT "You pressed 2"
- END
-
- Label3:
- PRINT "You pressed 3"
- END
-
-
- Notice that the more labels there are, the bigger the savings in code size.
- ON GOTO adds a fixed overhead of 70 bytes, 61 of which is the size of the
- library routine that evaluates the value and actually jumps to the code at
- the appropriate label. The remaining 9 bytes are needed to load the value
- being tested and pass that on to the ON GOTO routine. However, for each
- label in the list, only 2 bytes are required in the lookup table to hold
- the address.
- Compare that to SELECT CASE which requires 6 bytes of set-up code (when
- an integer is being tested), and 13 bytes more to process each CASE. Thus,
- the crossover point at which ON GOTO is more efficient is when there are
- 6 or more comparisons. Notice that if ON GOTO is used in more than one
- place in a program, the savings are even greater because the 61-byte
- library routine is added only once.
- Again, ON GOTO has the important restriction that all of the values must
- be sequential. However, this limitation can also be turned into a feature
- by taking advantage of the inherent efficiency of lookup tables.
- Using a lookup table is a very powerful technique, because you can
- determine a result using an index rather than actually calculating the
- answer. A lookup table is commonly used to determine log and factorial
- functions, since those calculations are particularly tedious and time
- consuming. With a lookup table you would calculate all of the values once
- ahead of time, and fill an array with the answers. Then, to determine the
- factorial for, say, the number 14, you would simply read the answer from
- the fourteenth element in the array.
- You can apply this same technique in BASIC using a combination of INSTR
- and ON GOTO or ON GOSUB. Although INSTR is intended to find the position
- of one string within another, it is also ideal for looking up characters
- in a table. Imagine you have written an input routine that must handle a
- number of different keys, and branch according to which one was pressed.
- One way would be to use an IF/ELSEIF or SELECT CASE block, with one section
- devoted to each possible key. But as you saw earlier, once there are more
- than 5 keys to be recognized, either of those constructs are less efficient
- than ON GOTO.
- The approach I often use is to combine INSTR and ON GOSUB to branch
- according to which function key was pressed. The beauty of this method is
- that a value of zero (or one that is out of range) causes control to fall
- through to the next statement. Therefore any keys that are not explicitly
- being tested for are simply ignored. This is shown in context below.
-
-
- DO
-
- DO 'wait for a key press
- K$ = INKEY$
- Length% = LEN(K$)
- LOOP UNTIL Length%
-
- IF Length% = 2 THEN 'it's an extended key
- Code$ = RIGHT$(K$, 1) 'isolate the key code and branch accordingly
- ON INSTR(";<=>?@ABCD", Code$) GOSUB ...
- END IF
-
- LOOP UNTIL K$ = CHR$(27) 'until they press Esc
-
-
- Here, extended keys are identified by a length of 2, and the key code is
- then isolated with RIGHT$. The punctuation and letters within the quotes
- are characters 59 through 68, which correspond to the extended codes for
- F1 through F10. (A list of all the extended key codes is in your BASIC
- owner's manual.) Of course, any arbitrary list of key codes could be used.
- Further, the key codes do not need to be contiguous. For example, to
- branch on the Up arrow, Down arrow, Ins, Del, PgUp, and PgDn keys you would
- use "HPRSIQ" as the source string. Any other mix of characters could also
- be used, including Alt keys.
- Another interesting and clever trick that combines INSTR and ON GOTO
- lets you test multiple keys regardless of capitalization. The short program
- below accepts a character, and uses INSTR to look it up in a table of upper
- and lower case character pairs.
-
-
- PRINT "Yes/No/Load/Save/Retry/Quit? ";
-
- DO
- K$ = INKEY$
- LOOP UNTIL LEN(K$) = 1
-
- ON (INSTR("YyNnLlSsRrQq", K$) + 1) \ 2 GOTO ...
-
-
- After adding 1 and dividing that by 2, the result will indicate in which
- character pair the choice was found. This technique could also be extended
- to include 3- or 4-character groups, or any other combination of
- characters. Since any value between 0 and 255 is legal for an ASCII
- character, INSTR can be used in other, more general lookup situations as
- well.
-
-
- A COMPARISON OF SUBROUTINE METHODS
- ==================================
-
- There are four primary subroutine types that BASIC supports: GOSUB
- subroutines, DEF FN functions, called subprograms, and what I refer to as
- "formal functions". Each has its own advantages and disadvantages, which
- I will describe momentarily. But I would first like to introduce several
- terms that will be used throughout the discussion that follows.
- The first is *module*, which is a series of BASIC program statements
- kept in their own separate source file. All modules have a main portion,
- and some also have procedures within a SUB or FUNCTION block. The main
- portion of a program is that which receives control when the program is
- first run. When a program is comprised of multiple modules, each
- additional module has a main portion, although code within that portion is
- rarely executed. In fact, there are only two ways to access code in the
- main portion of an ancillary module: One is to create a line label and use
- that as the target for ON ERROR or another "ON" event. The other is to
- define a DEF FN function and invoke the function.
- The second term is *variable scope*, which indicates where in a program
- a variable may be accessed. Variables that are used in the main portion
- of a program are accessible anywhere else in the main, but not within a SUB
- or FUNCTION block. Likewise, a variable that is defined within a SUB or
- FUNCTION is by default private to that procedure. The overwhelming
- advantage of private variables is that you do not have to worry about
- errors caused by inadvertently using the same variable name twice.
- The third term is *SHARED*, and it overrides the default private scope
- of a variable used in a procedure. SHARED may be used in either of two
- ways. If it is specified with a DIM statement in the main body of a
- program--that is, DIM SHARED Variable--the variable is established as being
- shared throughout the entire source file. Even though DIM is usually
- associated with arrays, it can be used this way to extend a variable's
- scope.
- SHARED may also be used within a subroutine to share one or more
- variables with the main portion. Notice that the statement SHARED Variable
- inside a procedure defines the variable as being shared with the main
- portion of the program only. SHARED used within a procedure does not share
- the named variable with any other procedures. The only exception is when
- other procedures also use SHARED with the same variable name. In that case
- they are shared between procedures, as well as with the main program.
- ╔═════════════════════════════╗
- ║ DEFINT A-Z ║
- ║ DIM SHARED Var1 ║
- ║ ║
- ┌──╫──>Var1 = 100 ║
- ┌──│──╫──>Var2 = 200 ║
- │ │ ║ CALL Sub1(Var2) ║
- │ │ ║ CALL Sub2(Var2) ║
- │ │ ║ END ║
- │ │ ║ ║
- │ │ ║ SUB Sub1 (Param) STATIC ║
- │ ├──╫────>Var1 = Param ║
- │ │ ║ Var2 = Var1 ║
- │ │ ║ END SUB ║
- │ │ ║ ║
- │ │ ║ SUB Sub2 (Param) STATIC ║
- │ │ ║ SHARED Var2 ║
- │ └──╫────>Var1 = Param ║
- └─────╫────>Var2 = Var1 ║
- ║ END SUB ║
- ╚═════════════════════════════╝
-
- Figure 3-1: How SHARED and DIM SHARED affect variable scope. Variables
- that share the same identity are shown connected.
-
- The fourth term is *COMMON*, which is related to SHARED in that it also
- lets you share variables among procedures. However, COMMON has the
- additional property of allowing variables to be shared by procedures that
- are not in the same physical source file. When BC compiles your program,
- it translates your variable names to memory addresses. Thus, those names
- are not available when the program is linked to other object files.
- Variables that are listed in a COMMON statement are placed in a separate
- portion of the data segment which is reserved just for that purpose.
- Therefore, other program modules using COMMON can also access those
- variables in that portion of DGROUP.
- MODULE1.BAS
- ╔═════════════════════════════╗
- ║ DEFINT A-Z ║
- ║ COMMON SHARED Var1 ║
- ║ ║
- ┌─────╫──>Var1 = 100 ║
- │ ┌──╫──>Var2 = 200 ║
- │ │ ║ CALL Sub1(Var2) ║
- │ │ ║ CALL Sub2(Var2) ║
- │ │ ║ END ║
- │ │ ║ ║
- │ │ ║ SUB Sub1 (Param) STATIC ║
- ├──│──╫────>Var1 = Param ║
- │ │ ║ Var2 = Var1 ║
- │ │ ║ END SUB ║
- │ │ ║ ║
- │ │ ║ SUB Sub2 (Param) STATIC ║
- │ │ ║ SHARED Var2 ║
- ├──│──╫────>Var1 = Param ║
- │ └──╫────>Var2 = Var1 ║
- │ ║ END SUB ║
- │ ╚═════════════════════════════╝
- │
- │ MODULE2.BAS
- │ ╔═════════════════════════════╗
- │ ║ DEFINT A-Z ║
- │ ║ COMMON Var1 ║
- │ ║ ║
- └─────╫──>Var1 = 100 ║
- ┌──╫──>Var2 = 200 ║
- │ ║ CALL Sub1(Var2) ║
- │ ║ CALL Sub2(Var2) ║
- │ ║ END ║
- │ ║ ║
- │ ║ SUB Sub1 (Param) STATIC ║
- │ ║ Var1 = Param ║
- │ ║ Var2 = Var1 ║
- │ ║ END SUB ║
- │ ║ ║
- │ ║ SUB Sub2 (Param) STATIC ║
- │ ║ SHARED Var2 ║
- │ ║ Var1 = Param ║
- └──╫────>Var2 = Var1 ║
- ║ END SUB ║
- ╚═════════════════════════════╝
-
- Figure 3-2: How COMMON and COMMON SHARED affect variable scope. Variables
- that share the same identity are shown connected.
-
- COMMON can also be combined with SHARED, to specify that one or more
- variables be shared throughout the main program as well as with other
- modules. That is, the statement COMMON SHARED Variable tells BASIC that
- Variable is to be both DIM SHARED and COMMON. To establish a TYPE variable
- as COMMON, you must state the type name as well: COMMON TypeVar AS MyType.
- In all cases, COMMON statements must precede the executable statements in
- a program. The only statements that may appear before COMMON are other
- non-executable statements such as DECLARE, CONST, and '$STATIC.
- Because the variable names listed in a COMMON statement are not stored
- in the final program, the names used in one module do not need to be the
- same as the corresponding names in another module. You could, for example,
- have COMMON X%, Y$, Z# in one file, and COMMON A%, B$, C# in another.
- Here, X% refers to the same memory location as A%; Y$ is the same variable
- as B$, and so forth. It is imperative, however, that the order and type
- of variables match. If one file has an integer followed by a string
- followed by a double precision variable, then all other files containing
- a COMMON statement must have their COMMON variables in that same order.
- This is one good reason for storing all COMMON statements in a single
- include file, which is included by each module that needs access to the
- COMMON variables.
- One or more arrays may also be listed as COMMON; however, the rules are
- different for static and dynamic arrays. When a dynamic array is to made
- COMMON, it should be dimensioned in the main program only, following the
- COMMON statement. (But you may use REDIM in another module if necessary,
- to change the array's size.) Static arrays must be dimensioned in each
- module, before the associated COMMON declaration. Of course, all array
- types must match across modules--you may not list a static array as the
- first COMMON item in one file, and then list a dynamic array in that same
- position in another file.
- There are actually two forms of COMMON statement: the blank COMMON and
- the named COMMON. The examples shown thus far are blank COMMON statements.
- A named COMMON block lets you specify selected variable groups as COMMON,
- to avoid having to list many variables when all of them are not needed in
- a given module. A COMMON block is named by preceding the variable list
- with a name surrounded by slash characters. For instance, this line:
-
- COMMON /IntVars/ X%, Y%, Z%
-
- establishes a named COMMON black called IntVars. By creating several such
- named blocks you may share only those that are actually needed in a given
- module.
- In this case, the block name is stored in the object file, and LINK
- ensures that the COMMON variables in each module share the same addresses.
- One important limitation of a named COMMON block is that it cannot be used
- to pass information between programs that use CHAIN.
- The fifth term is *STATIC*, which I described in a slightly different
- context in the section about data in Chapter 2. When you add the STATIC
- option to a SUB or FUNCTION definition, BASIC treats the variables within
- that procedure very differently than when STATIC is omitted. With STATIC,
- memory in DGROUP is allocated by the compiler for each variable, and that
- memory is permanently reserved for use by those variables.
- When STATIC is not specified, the variables in the routine are by
- default placed onto the system stack. This means that sufficient stack
- memory must be available, although that memory can then be used again later
- for variables in other procedures. An important side effect of using the
- stack for variable storage is that the memory is cleared each time the
- subprogram or function is entered. Therefore, all numeric variables are
- initialized to zero, and strings are initialized to null. Any arrays
- within a non-static procedure are by default dynamic, which means they are
- created upon entry to the routine and erased when the routine exits.
- STATIC also has an additional meaning in subprograms and functions; it
- can establish variables as being private to a procedure. If a variable has
- been declared as shared throughout a module by using DIM SHARED in the main
- portion of the program, using the statement STATIC Variable inside the
- subroutine will override that property. Thus, Variable will be local to
- the procedure, and will not conflict with a global shared variable of the
- same name. STATIC within a subprogram or function also lets you use the
- same name for a variable that was already given to a named constant.
- Many programmers find the use of the term STATIC for two very different
- purposes confusing, and rightly so. It would have made more sense to use
- a different keyword, perhaps LOCAL, to limit a variable's scope. And to
- further confuse the issue, the '$STATIC metacommand is used to establish
- the memory storage method for arrays. None the less, STATIC always
- indicates that memory for a variable is permanently allocated, and it may
- also specify that a variable is private to a procedure.
- The final term I want to introduce now is *recursion*. The classic
- definition of a recursive procedure is that it may call itself. While this
- is certainly true, that doesn't really explain what recursion is all about,
- or how it could be useful. I will cover recursion in depth momentarily,
- but for now suffice it to say that recursion is often helpful when
- manipulating tree-structured information.
- For example, a program that lists all of the files on a hard disk would
- most likely be based on a recursive subroutine. Such a program would first
- change to the root directory, and then call the routine to read and display
- all of the file names it finds there. Then for each directory under the
- current one, the routine would change to that directory and call itself
- again to read and display the files in that directory. And if more
- directories were found at the next level down, the routine would call
- itself yet again to process all of those files too. This continues until
- all of the files in all directories on the hard disk have been processed.
- Another application for recursion is a subroutine that sorts an array
- on more than one key. For example, consider a TYPE array in which each
- element has components for a first name, a last name, and address fields.
- You might want to be able to sort that array first by last name, then by
- first name, and then by zip code. That is, all of the Smiths would be
- grouped together, and within that group Adam would be listed before John.
- All of the John Smiths would in turn be sorted in zip code order.
- By employing recursion, the routine would first sort the entire array
- based on the last name only. Next, it would identify each range of
- elements that contain identical last names. The routine would then call
- itself to sort that subgroup, and call itself again to sort the subgroup
- within that group based on zip code.
-
-
- SUBROUTINES VERSUS FUNCTIONS
-
- There is a fundamental difference between subroutines and functions. A
- subroutine is accessed with either a CALL or GOSUB statement, and a
- function is invoked by referencing its name. In general, a subroutine is
- used to perform an action such as opening a group of files, or perhaps
- updating a screen-full of information. A function, on the other hand,
- returns a value such as the result of a calculation. A string function
- also returns information, although in this case that information is a
- string.
- Notice that the type of information returned by a function is
- independent of the type of parameters, if any, that are passed to it. For
- example, BASIC's native STR$ function accepts a numeric argument but
- returns a string. Likewise, a numeric function such as INSTR accepts two
- strings and returns a single integer. This is also true for functions that
- you design using either DEF FN or FUNCTION.
- Although a function is primarily used for calculations and a subroutine
- for performing one or more actions, there is no hard and fast distinction
- between the two. You could easily design a subroutine that multiplies
- three numbers and returns the answer in one of the parameters. Similarly,
- a function could be written to clear the screen and then open a file.
- Which you use and when will depend on your own programming style. However,
- there are definite advantages to using functions where appropriate.
- One immediately obvious benefit of a function is that a value can be
- returned without requiring an additional passed parameter. Each variable
- that is passed as a parameter requires 4 bytes of code for setup, plus an
- additional 5 bytes within the subroutine each time it is accessed.
- Another important advantage of using a function is BASIC's automatic
- type conversion. If you assign a single precision variable from the result
- of an integer function, BASIC will convert the data from one format to the
- other transparently. In fact, a simple assignment from a variable of one
- type to that of another type is also handled for you by the compiler. But
- if a routine is written to pass the value back as a parameter, then you
- must use whatever type of data the subprogram expects.
- Although most high-level languages require the programmer to match
- explicitly the types of data being assigned, Microsoft BASIC has done this
- automatically since its inception. When you write Var1! = Var2%, BASIC
- treats that as Var1! = CSNG(Var2%). Object oriented programming languages
- use the term *polymorphism* to describe such automatic type conversion.
-
-
- GOSUB ROUTINES
-
- The primary advantage a GOSUB routine holds over all of the other
- subroutine types is that it can be accessed very quickly. Translated to
- assembly language a GOSUB statement is but three bytes in length, and its
- speed is surpassed only by a GOTO. When the only thing that matters is how
- fast a subroutine can be called, GOSUB has the clear advantage. However,
- there are many limitations inherent in a GOSUB.
- The most important restriction is that arguments cannot be passed using
- GOSUB. Therefore, any variables must be assigned before invoking the
- routine, and possibly reassigned when it returns. For example, if a
- subroutine requires two parameters--perhaps a row and column at which to
- print a message--those variables must be assigned before the GOSUB can be
- used. And if a value is being returned, your program must know the name
- of the variable that was assigned within the GOSUB routine.
- Another important limitation is that the target line label must be in
- the same block of code as the GOSUB. Although a GOSUB is legal within a
- SUB or FUNCTION, both the GOSUB and the routine it calls must be located
- in the same procedure. Likewise, a GOSUB in the main body of a program
- cannot access a subroutine inside a procedure, or vice versa. [And of
- course you cannot invoke a GOSUB routine that is located in a different
- source module.]
- Both of these problems restrict your ability to reuse a subroutine in
- more than one program. One of the goals of modern structured programming
- is the ability to design a routine for one application, and also use it
- again later in other programs. The only way to do that using GOSUB
- routines is to establish a variable naming convention, and always use
- variables and line labels with those unique names.
-
-
- SUBPROGRAMS
-
- Subprograms were introduced with QuickBASIC version 2.0, and they improve
- greatly on GOSUB routines in many respects. The most important advantages
- of a subprogram are that it accepts passed parameters, and that variables
- used within the subprogram are local by default. Besides the obvious
- benefit of not having to worry about variable naming conflicts, these
- properties allow you to create your own toolbox of useful subroutines, and
- use them repeatedly in different programming projects. I will discuss this
- use of subprograms in detail later in this chapter.
- A subprogram is accessed using the CALL statement, and any number of
- arguments may optionally be passed to the routine. A subprogram is defined
- with a statement of the form SUB SubName (Param1, Param2, ...) STATIC. The
- parameters and surrounding parentheses are optional, as is the STATIC
- directive. Of course, the number of arguments passed to a subprogram must
- match the number of parameters it expects.
- As you can see, subprograms have many advantages over GOSUB routines.
- However, they are not a magical panacea for every programming problem.
- Each subprogram includes a fixed amount of overhead just to enter and exit
- it. Because of the complexities of accessing incoming parameters, a *stack
- frame* must be created by the compiler upon entry. A stack frame is simply
- a fancy name for an area of memory that holds the addresses of the incoming
- parameter. However, this requirement adds a fair amount of code to each
- subprogram.
- Eight bytes of code are needed to set up and call the internal BASIC
- routine that creates the stack frame, and the routine itself comprises
- another 35 bytes. Eight more bytes are needed to call the routine that
- exits a subprogram, and that routine adds contains 26 bytes. Finally, all
- but the last subprogram in a source file needs a 3-byte jump to skip over
- the other subprograms that follow. Therefore, a total of 80 bytes are
- added to any program that uses a subprogram rather than a GOSUB routine.
- It is important to point out, however, that the 61 bytes used by the
- library routines to enter and exit a subprogram are added to the final .EXE
- file only once.
- It is also worth mentioning that BASIC PDS provides the /Ot switch,
- which eliminates the usual overhead incurred from calling the routines
- needed to enter and exit a subprogram. Although using /Ot avoids the code
- that is otherwise added, there is one important restriction: You may not
- use a GOSUB within the subprogram. When a program performs a GOSUB, the
- address to return to is placed onto the stack, for retrieval later when the
- subroutine returns. Likewise, when a subprogram is called, both a segment
- and address to return to are put on the stack.
- If a GOSUB were used inside the subprogram and an EXIT SUB was then
- encountered within the GOSUBed subroutine, the return addresses on the
- stack would be out of order. Thus, the subprogram would return to the
- wrong place, with undoubtedly disastrous consequences. To avoid this,
- BASIC by default saves the address to return to when the subprogram is
- first entered, and uses that when it is exited. Therefore, when the
- compiler sees that a GOSUB is being used, it does not use the abbreviated
- method even if /Ot has been specified.
- Although using /Ot makes a subprogram (and function) much faster by
- eliminating the overhead to call the entry and exit routines, there is no
- actual savings in code size. A series of assembler NOP (No Operation)
- instructions are placed where the entry and exit code would have been.
- However, those empty instructions are never executed. We can only hope
- that in future releases of BASIC PDS Microsoft will improve BC's code
- generation to eliminate these unnecessary instructions. [Yeah, right.]
- Another problem with subprograms is that programmers tend to use them
- to excess. For example, I have seen people create subprograms to increment
- and decrement integer variables even though it is far more efficient to do
- that with in-line code. The statement X% = X% + 1 creates only 4 bytes of
- code, compared to 9 for a single call to a subprogram to do the same thing!
- However, incrementing long integer or floating point variables does take
- more code than invoking a subprogram with a single parameter, so a
- subprogram could be useful in that case. Only by counting the number of
- times a subprogram will be used and comparing that to the overhead incurred
- can you determine whether there will be any savings.
-
-
- DEF FN FUNCTIONS
-
- Although a DEF FN function is designed to return a result, it is more
- closely related to a GOSUB subroutine in actual operation. Like a GOSUB
- routine it is invoked with a 3-byte assembly language "near" call, as
- opposed to the 5-byte "far" call that subprograms and formal functions
- require. And while a DEF FN function can accept incoming parameters,
- variables within the function definition are by default shared with the
- main portion of the program.
- As I already explained, variables used in a DEF FN function can be made
- private to the function only by explicitly declaring them as STATIC.
- However, at least it is possible to employ local variables. Further, a DEF
- FN function can return a result, which makes it an ideal replacement for
- GOSUB when speed is paramount.
- Internally, parameters are passed to a DEF FN function very differently
- than to a called subprogram or formal function. Arguments are passed to
- a subprogram by placing their addresses on the stack. With a DEF FN
- function, however, a copy of each parameter is created, and the function
- directly manipulates those copies. Therefore, it is impossible for a DEF
- FN function to modify an incoming parameter directly. This behavior is
- neither good nor bad. Rather, it is simply different and thus important
- to understand. It is also important to understand that a DEF FN function
- can be used only in the module in which it is defined. If the same
- function is needed in different modules, the same code must be duplicated
- again and again.
- In the manuals that come with QuickBASIC and BASIC PDS, Microsoft
- advises against using DEF FN functions, in favor of the newer, more
- powerful formal functions. Because of this favoritism, Microsoft will
- probably never correct one disturbing anomaly that is present in all DEF
- FN functions. When a string is passed as an argument to a DEF FN function,
- a copy is made for the function to manipulate. Unfortunately, the copy is
- never deleted! Therefore, if you pass, say, a 10,000 byte string to a DEF
- FN function, that amount of memory is permanently taken until the function
- is invoked again later. The short listing below proves this behavior.
-
-
- DEF FnWaste (A$)
- FnWaste = ASC(A$)
- END DEF
-
- Big$ = SPACE$(10000)
- PRINT FRE(Big$)
- X = FnWaste(Big$)
- PRINT FRE(Big$)
-
-
- Notice that running this program in the QuickBASIC editing environment will
- not give the expected (memory-wasting) result. However, in a separately
- compiled program the 10000 byte loss will be evident.
- As with subprograms, there is a fixed amount of overhead required to
- enter and exit a DEF FN function. For each function that has been defined,
- 5 bytes are needed to call the Enter and Exit routines. Further, these
- routines are 14 and 24 bytes in length respectively. But again, the
- routines themselves are added to a program only once when it is linked.
- There are two final limitations of DEF FN functions worth mentioning
- here. The first is that arrays and TYPE variables may not be passed as
- parameters to them. Since by design a copy is made of every incoming
- parameter, there is no reasonable way to do that with an entire array. The
- second limitation is that the function definition must be physically
- positioned in the source file before any references are made to it.
-
-
- FORMAL FUNCTIONS
-
- A formal function is nearly identical to a called subprogram, and it
- requires the exact same amount of overhead to enter and exit. Also like
- subprograms, nearly any type of data may be passed to a function, including
- TYPE variables and arrays. The only limitation is that a fixed-length
- string may not be used directly as a parameter. If a fixed-length string
- is passed to a subprogram or function that expects a string, a copy is made
- and assigned to a conventional string. This copying was described in
- detail in Chapter 2.
- Because a formal function is invoked by referencing its name in an
- assignment or PRINT statement, it is essential that it be declared. After
- all, how else could BASIC know that the statement PRINT MyFunc means to
- call a function and display the result, as opposed to printing the variable
- named MyFunc? When a BASIC function is created in the BASIC editing
- environment, a corresponding DECLARE statement is generated automatically.
- But when a function is written in another language or kept in a Quick
- Library, an explicit declaration is mandatory.
- Like subprograms, formal functions are ideally suited to modular,
- reusable programming methods. Furthermore, a function may be accessed from
- any module in an entire application, even those in other source files.
- Indeed, the only difference between a subprogram and a function is that a
- function returns a result. The assembly language code that BASIC generates
- is in all other respects identical.
-
-
- STATIC VERSUS NON-STATIC PROCEDURES
-
- As I stated earlier, when the STATIC keyword is appended to a SUB or
- FUNCTION declaration, all of the variables within the routine are assigned
- a permanent address in DGROUP. And when STATIC is omitted, the variables
- are instead stored on the stack and cleared to zeros or null strings each
- time the routine is entered. There are several important ramifications of
- this behavior. Non-static procedures allocate new stack memory each time
- they are invoked, and then release that memory when they exit. It is
- therefore possible to exhaust the available stack space when the subroutine
- calls are deeply nested.
- For example, if you call one subprogram that then calls another which
- in turns calls yet another, sufficient stack memory must be available for
- all of the variables in all of the subprograms. Besides the memory needed
- for each variable in a subprogram or function, other data is also placed
- onto the stack as part of the call. For each parameter that is passed, 2
- bytes are taken to hold its address. Add to that 4 bytes to store the
- segment and address to return to in the calling program. Finally,
- temporary variables that BASIC creates for its own purposes are also stored
- on the stack in a non-static subprogram or function.
- Another important consideration when STATIC is omitted is that every
- string variable must be deleted before the subprogram exits. Because of
- the way BASIC's string management routines operate, memory that holds
- string descriptors and string data cannot simply be abandoned. Every
- string must be released explicitly by a called routine, at a cost of 9
- bytes per string. Please understand that you do not have to delete these
- strings. Rather, this is another case where BASIC creates additional code
- without telling you.
- Again, I would love to be able to tell you that using STATIC is always
- desirable, or that never using it always makes sense. But unfortunately,
- it just isn't that simple. When a program becomes very large and complex,
- only by counting variables can you be absolutely certain how much stack
- space is really needed. Although the FRE(-2) function may be used to
- determine how much stack memory is currently available, it does not tell
- how much memory is actually needed by each routine.
- To summarize the trade-offs between static and non-static variables:
- Static variables are allocated permanently by the compiler, and the memory
- they occupy can never be used for any other purpose. Non-static variables
- are placed onto the stack, and exist only while the subprogram or function
- is in use. Remember that you can also have a mix of static and non-static
- variables in the same procedure. By omitting STATIC after the subroutine
- name, all variables will by default be non-static. You can then override
- that property for selected variables by using the STATIC keyword. In the
- section on debugging in Chapter 4, you will learn how to use CodeView to
- determine the stack requirements for a procedure's variables.
-
-
- Controlling the Stack Size
-
- There are several ways to control the amount of memory that is dedicated
- for use by the stack. All versions of BASIC support the CLEAR command,
- which takes an optional argument that sets the stack size. The statement
- CLEAR , , StackSize sets aside StackSize bytes for the stack.
- Unfortunately, CLEAR also clears all of the data in a program, closes any
- open files, and erases all arrays. If you know ahead of time how much
- stack memory will be needed, then using CLEAR as the first statement in a
- program will not cause a problem.
- Even when CLEAR is used as the first statement in a program, there is
- still one situation where that will not be acceptable. When you use CHAIN
- to execute a subsequent program, a CLEAR statement in that program will
- clear all of the variables that have been declared COMMON. Fortunately,
- there are two solutions to this problem: BASIC PDS offers the STACK
- statement, which lets you establish the size of the stack but without the
- side effects of CLEAR. For example, the statement STACK 5000 sets aside
- 5000 bytes for the stack. The other solution is to use the /STACK: link
- switch, which reserves a specified number of bytes. All of the options
- that LINK supports are described in Chapter 5.
-
-
- RECURSION
-
- I have already illustrated some of the situations in which a recursive
- subprogram or function could be useful. Now lets look at some actual
- programming examples. The Evaluate function in the listing below uses
- recursion to reinvoke itself for each new level of parentheses it
- encounters.
-
-
- DECLARE FUNCTION Evaluate# (Formula$)
-
- INPUT "Enter an expression: ", Expr$
- PRINT "That evaluates to"; Evaluate#(Expr$)
-
- FUNCTION Evaluate# (Formula$)
-
- 'Search for an operator using INSTR as a table lookup. If found,
- 'remember which one and its position in the string.
- FOR Position% = 1 TO LEN(Formula$)
- Operation% = INSTR("+-*/", MID$(Formula$, Position%, 1))
- IF Operation% THEN EXIT FOR
- NEXT
-
- 'Get the value of the left part, and a tentative value for the
- 'right part.
- LeftVal# = VAL(Formula$)
- RightVal# = VAL(MID$(Formula$, Position% + 1))
-
- 'See if there's another level to evaluate.
- Paren% = INSTR(Position%, Formula$, "(")
-
- 'There is, call ourselves for a new RightVal#.
- IF Paren% THEN RightVal# = Evaluate#(MID$(Formula$, Paren% + 1))
-
- 'No more to evaluate, do the appropriate operation and exit.
- SELECT CASE Operation%
- CASE 1 'addition
- Evaluate# = LeftVal# + RightVal#
- CASE 2 'subtraction
- Evaluate# = LeftVal# - RightVal#
- CASE 3 'multiplication
- Evaluate# = LeftVal# * RightVal#
- CASE 4 'division
- Evaluate# = LeftVal# / RightVal#
- END SELECT
-
- END FUNCTION
-
-
- When you run this program, enter an expression like 15 * (12 + (100 / 8)).
- To keep the code to a minimum, Evaluate accepts only simple, two-number
- expressions. That is, it will not work with more than one math operator
- within each pair of parentheses as in 10 * (3 + 4 + 5). However, the
- parentheses may be nested to nearly any level.
- This function begins by examining each character in the incoming formula
- string for a math operator. If it finds one the operator number (1 through
- 4) is remembered, as well as its position in the formula string. Next, VAL
- is used to obtain the value of the digits to the left of the operator, as
- well as the digits to the right. Notice that it was not necessary to use
- LEFT$ to isolate the left-most portion of the string, because VAL stops
- examining the string when it encounters any non-digit character such as the
- "+" or "(".
- Once these values have been saved, the next test determines if any more
- parentheses follow in the formula. If so, Evaluate calls itself, passing
- only those characters that are beyond the next parenthesis. Thus, the same
- routine evaluates each new level, returning to the level above only after
- all levels have been examined. I encourage you to run this program in the
- QuickBASIC editing environment, and step through each statement one by one
- with the F8 Trace command. In particular, use the Watch Variable feature
- to view the value of Position% and LeftVal# as the function recurses into
- subsequent invocations.
- It is important to understand the need for stack variables in this
- program, and why STATIC must not be used in the function definition. When
- Evaluate walks through the incoming string and determines which math
- operator is specified, that operator must be remembered throughout the
- course of the function. If a static variable were used for Operation%,
- then its previous value would be destroyed when Evaluate calls itself.
- Likewise, LeftVal# cannot be overwritten either, or it would not hold the
- correct value when Evaluate returns to itself from the level below.
- Therefore, as you step through this program you will observe that each new
- invocation of Evaluate creates a new set of variables.
- As you can see, stack variables are necessary for the proper functioning
- of a subprogram or function that calls itself. They are also necessary
- when one procedure calls another procedure which in turn calls the first
- one again. The key point is that each time a non-static routine is
- invoked, new and unique variables must be created. Otherwise, the variable
- contents from a previous level above will be overwritten.
- Although recursion is a powerful and necessary technique, it should be
- used only when necessary. There is a substantial amount of overhead needed
- to allocate stack memory and clear it to zeros, so invoking a non-static
- routine is relatively slow. And as I described earlier, every non-static
- string variable must be deleted when the routine exits, at a cost of 9
- bytes apiece.
- Some programmers use recursion even when there are other, more efficient
- ways to solve a problem. For example, the QuickBASIC manual shows a
- recursive function that calculates a factorial. (A factorial is derived
- by multiplying a number by all of the whole numbers less than itself. That
- is, the factorial of 4 equals 4 * 3 * 2 * 1.) However, a factorial can
- be calculated faster and with less code using a simple FOR/NEXT loop as
- shown below. This version of Factorial is 20 percent faster than the
- example given in the QuickBASIC manual.
-
-
- FUNCTION Factorial#(Number%) STATIC
- Seed# = 1
- FOR X% = 1 TO Number%
- Seed# = Seed# * X%
- NEXT
- Factorial# = Seed#
- END FUNCTION
-
-
- PASSING PARAMETERS TO PROCEDURES
-
- As you have already learned, BASIC normally passes data to a subprogram or
- function by placing its address on the stack. And when an entire array is
- specified, the address of the array descriptor is sent instead. But there
- are some cases where BASIC imposes restrictions on how variables and arrays
- may be passed to a procedure. Let's look now at some of the ways to get
- around those restrictions.
- When using versions of BASIC earlier than PDS 7.1, it is not legal to
- pass an array of fixed-length strings. In fact, it is also impossible to
- pass a single fixed-length string directly. As you saw in Chapter 2, BASIC
- copies every fixed-length string argument to a regular string, which adds
- a lot of code and also wastes string memory.
- The simplest solution for fixed-length strings is to define an
- equivalent TYPE that is comprised of a single string component. Since a
- TYPE variable or array can legally be passed, this is the easiest and most
- direct approach, as shown here.
-
-
- TYPE FLen
- S AS STRING * 100
- END TYPE
- DIM MyString AS Flen
- CALL Subprogram(MyString)
-
- SUB Subprogram(FLString AS FLen)
- ...
- ...
- END SUB
-
-
- If the subprogram being called is in a separate module, then the TYPE
- definition must also be present in that file. However, the DIM statement
- is needed only in the program that passes the string. This also works with
- fixed-length string arrays, except that the DIM would have to be changed
- to DIM MyArray(1 TO NumElements) AS FLen, and the subprogram's definition
- would be changed to SUB Subprogram(FLString() AS FLen).
- BASIC PDS 7.1 supports passing a fixed-length string array directly, so
- this work-around is not needed with that version. Curiously, a single
- fixed-length string may not be passed as a parameter in BASIC 7.1. Since
- a fixed-length string is closely related to a TYPE variable, this
- limitation seems arbitrary at best.
- BASIC 7.1 also supports the use of BYVAL when passing numeric arguments
- to procedures. This is a particularly powerful feature, because it can
- greatly reduce the amount of code needed to access those values within the
- routine. It also eliminates the need to make copies when a constant is
- passed as an argument. To take advantage of this feature, you simply
- specify BYVAL in both the calling and receiving argument list, as shown
- below.
-
-
- DECLARE SUB Subroutine(BYVAL Arg1%, BYVAL Arg2%)
- CALL Subroutine(Var1%, Var2%)
-
- SUB Subroutine(BYVAL X%, BYVAL Y%)
- ...
- ...
- END SUB
-
-
- Because the actual value of the argument is being passed, there is no way
- to return information back to the caller. But in those situations where
- an assignment to the original variable from within the routine is not
- needed, BYVAL can eliminate a lot of compiler-generated code when dealing
- with integers. Of course, you may use a mix of BYVAL and non-BYVAL
- parameters if you need the benefits of both methods in a single call.
- As proof of this savings, disassemblies of a one-statement subprogram
- designed both ways is presented below, to show how an integer parameter is
- accessed when it is passed by address and by value.
-
-
- SUB ByAddress(Param%) STATIC
- LocVar% = Param%
- MOV SI,[Param%] ;get the address of Param%
- MOV AX,[SI] ;then read the value there
- MOV LocVar%,AX ;assign that to LocVar%
- END SUB
-
-
- SUB ByValue(BYVAL Param%) STATIC
- LocVar% = Param%
- MOV AX,Param% ;read Param% directly
- MOV LocVar%,AX ;and assign it to LocVar%
- END SUB
-
-
- Note that the savings are only within the subroutine, and not when it is
- called. That is, 4 bytes are needed to pass an integer variable whether
- by address or by value. In fact, passing larger data types requires more
- code to pass by value. Any variable can be passed by address with 4 bytes
- of compiler-generated code, because what is sent is a single address. But
- to pass a double precision number by value requires 16 bytes, since 4 bytes
- of code are needed for each 2-byte portion of the number.
- In general, passing variables as parameters to a subprogram or function
- is preferable to sharing them. When many variables are shared throughout
- a program, you run the risk of introducing bugs caused by accidentally
- using the same variable name more than once. However, sharing has some
- definite advantages in at least two situations.
- The first is when a procedure must be accessed as quickly as possible.
- Since a finite amount of code is needed to pass each parameter, some amount
- of time is also required to execute that code. Therefore, sharing a few,
- carefully selected variables can improve the speed of your programs and
- reduce their size as well. Another important use for SHARED is to conserve
- data memory. Nearly all programs use at least a few temporary scratch
- variables, perhaps as FOR/NEXT loop counters. By dimensioning several such
- variables as being shared throughout a program, the same variables can be
- used repeatedly. I often begin programs with a DIM SHARED statement such
- as DIM SHARED X, Y, Z, and then use those variables as often as possible.
- One final trick I want to share is how to pass a large number of
- parameters using less code than would normally be necessary. Each argument
- that is passed to a procedure requires 4 bytes of code. In a complicated
- routine that needs many parameters, this can quickly add up. Worse, these
- bytes are added for every call. Therefore, a subprogram that accepts 10
- parameters and is called 20 times will add 800 bytes to the final
- executable file just to handle the parameters!
- One solution is to use an array, which is ideal when all of the
- parameters are the same type of data. An entire array can be passed as a
- single parameter since only the array descriptor's address is needed. Even
- better, however, is to create a TYPE variable, and then assign all of the
- parameters to it. A TYPE variable can hold nearly any amount and type of
- data, and it too can be passed using only 4 bytes. Although this does
- require a separate assignment for each TYPE component, you simply use the
- TYPE where the regular variables would have been assigned. By eliminating
- the added code to pass many parameters, programs that use a TYPE this way
- will also be much faster.
-
-
- MODULAR PROGRAMMING
-
- QuickBASIC versions 4.0 and later let you load subprograms and functions
- from multiple files into the editing environment at the same time. This
- further enhances their reusability, since the different modules can be
- treated as "black boxes" whose purpose is already known. Once a routine
- has been developed and debugged, it can be used again and again, without
- further regard for the names of the variables within the routines. Indeed,
- many of the utility routines included with this book are provided as
- separate modules, intended to be loaded along with your programs.
- Any variable name can be passed as an argument to a procedure, even if
- a different name is used to represent the same variable within the
- procedure. If you have defined a subprogram such as SUB MySub(X%, Y!, Z$),
- then you could call it using CALL MySub(A%, B!, C$). Of course, the
- variables you pass must be of the same data type as the subroutine expects.
- Because reusability is an important consideration in the design of any
- procedure, it generally makes sense to store it in its own source file.
- This lets you combine the same module repeatedly with any number of
- programs. The alternative would be to merge the file into each program
- that needs it. But maintaining multiple copies of the same code wastes
- disk space. Further, if a bug is found in the routine, you will have to
- identify all of the programs that contain it, and manually correct each
- one of them.
- Another important advantage of using separate files is that you can
- exceed the usual 64K code size barrier. Unlike the data segment which is
- comprised of the sum of all data in all modules, an .EXE file can contain
- multiple code segments. Each BASIC module has a single code segment, and
- each of these can be as large as 64K. In fact, dividing a program into
- separate files is the *only* way to exceed the usual 64K code size
- limitation.
- Although using a separate source file for each subprogram makes sense
- in many situations, there is one slight disadvantage. When all of the
- various program modules are linked together, each separate module adds
- approximately 100 bytes of overhead. None the less, for all but the
- smallest programming projects, the advantages of using separate modules
- will probably outweigh the slight increase in code size.
-
-
- INCLUDE FILES
-
- Another useful BASIC feature that can help you to create modular programs
- is the Include file. An Include file is a separate file that is read and
- processed by BASIC at a specified place in your program. The statement
- '$INCLUDE: 'filename' tells QB or BC to add the statements in the named
- file to your source code, as if that code had been entered manually. If
- a file extension is not given, then .BAS is assumed. Many of the files
- that Microsoft provides with QuickBASIC use a .BI extension, which stands
- for "BASIC Include". Some programmers use .INC, and you may use whatever
- seems appropriate to the contents of the file.
- Include files are ideal for storing DECLARE, CONST, TYPE, and COMMON
- statements. Except for COMMON, none of these statements add to the size
- of your program, and none of them create any executable code. Therefore,
- you could create a single include file that is used for an entire project,
- and add an appropriate '$INCLUDE directive to the beginning of each program
- source file. Unused DECLARE and CONST statements and TYPE definitions are
- ignored by BASIC if they are not referenced. However, they do impinge
- slightly on available memory within the QuickBASIC editor, since BASIC has
- no way to know that they are not being used. Similarly, BC must keep track
- of the information in these statements as it compiles your program. But
- again, there is no impact on the size of your final executable program.
- In general, I recommend that you avoid placing any executable statements
- into an include file. Because the code in an include file is normally
- hidden from your view, it is easy to miss a key statement that is causing
- a bug. Likewise, a '$DYNAMIC or '$STATIC command hidden within an include
- file will obscure the true type of any arrays that are subsequently
- dimensioned. Perhaps worst of all is placing a DEFINT or other DEFtype
- statement there, for the same reason.
-
-
- QUICK LIBRARIES
-
- Quick Libraries contribute to modular programming in two important ways.
- Perhaps the most important use for a Quick Library is to allow access to
- subprograms and functions that are not written in BASIC. All DOS programs
- and subroutines--regardless of the language they were originally written
- in--end up as .OBJ files suitable for LINK to join together. But the QB
- and QBX editing environments manipulate BASIC source code, and interpret
- the commands rather than truly compile them. Therefore, the only way you
- can access a routine written in assembly language or C within QuickBASIC
- is by placing the routine into a Quick Library.
- Quick Libraries also let you store completed BASIC subprograms and
- functions out of the way from the rest of your program. If you have a
- large number of subroutines in one program, the list of names displayed
- when F2 is pressed can be very long and confusing. Since QuickBASIC does
- not display the routines in a Quick Library, there will be that many fewer
- names to deal with. Another advantage of placing pre-compiled BASIC
- routines into a Quick Library is that they can take less memory than when
- the BASIC source code is loaded as a module. This is true especially when
- you have many comments in the program, since comments are of course not
- compiled.
- Be aware that there are a few disadvantages to placing BASIC code into
- a Quick Library. One is that you cannot step and trace through the code,
- since it is not in its original BASIC source form. Another is that Quick
- Libraries are always stored in normal DOS memory, as opposed to expanded
- memory which QBX [and VB/DOS] can use. When a BASIC subprogram or function
- is less than 16K in size and EMS is present, QBX [and VB/DOS] will place
- its source code in expanded memory to free up as much conventional memory
- as possible.
-
-
- ERROR AND EVENT HANDLING
- ========================
-
- As a BASIC programmer, there are several types of errors that you must deal
- with in a program. These errors fall into two general categories: compile
- errors and runtime errors. Compile errors are those that QB or BC issue,
- such as "Syntax error" or "Include file not found". Generally, these are
- easy to understand and correct, because the QuickBASIC editor places the
- cursor beneath the offending statement. In some cases, however, the error
- that is reported is incorrect. For example, if your program uses a
- function in a Quick Library that expects a string parameter and you forgot
- to declare it, BASIC reports a "Type mismatch" error. After all, with a
- statement such as X = FuncName%(Some$), how could BASIC know that FuncName%
- is not simply an integer array? Assuming that it is an array, BASIC
- rejects Some$ as being illegal for an element number.
- Runtime errors are those such as "File not found" which are issued when
- your program tries to open a file that doesn't exist, or is not in the
- specified directory. Other common runtime errors are "Illegal function
- call", "Out of string space", and "Input past end". Many of these errors
- can be avoided by an explicit test. If you are concerned that string space
- might be limited you can query the FRE("") function before dimensioning a
- dynamic string array. However, some errors are more difficult to
- anticipate. For example, to determine if a particular directory exists you
- must use CALL Interrupt to query a DOS service.
- The conventional way to handle errors is to use ON ERROR, and design an
- error handling subroutine. There are a number of problems with using ON
- ERROR, and most professional programmers try to avoid using it whenever
- possible. But ON ERROR does work, and it is often the simplest and most
- direct solution in many programs. The short listing below shows the
- minimum steps necessary to implement an error handler using ON ERROR.
-
-
- ON ERROR GOTO HandleErr
- FILES "*.XYZ"
- END
-
- HandleErr:
- SELECT CASE ERR
- CASE 53: PRINT "File not found"
- CASE 68: PRINT "Device unavailable"
- CASE 71: PRINT "Disk not ready"
- CASE 76: PRINT "Path not found"
- CASE ELSE: PRINT "Error number"; ERR
- END SELECT
- RESUME NEXT
-
-
- The statement ON ERROR GOTO HandleErr tells BASIC that if an error occurs,
- the program should jump to the HandleErr label. Without ON ERROR, the
- program would display an error message and then end. Since it is unlikely
- that you have any files with an .XYZ extension, BASIC will go to the error
- handler when this program is run. Within the error handling routine, the
- program uses the ERR function to determine the number of the error that
- occurred. Had line numbers been used in the program, the line number in
- which the error occurred would also be available with the ERL function.
- In this brief program fragment, the most likely error numbers are
- filtered through a SELECT CASE block, and any others will be reported by
- number. Regardless of which error occurred, a RESUME NEXT statement is
- used to resume execution at the next program statement. RESUME can also
- be used with an explicit line label or number to resume there; if no
- argument is given BASIC resumes execution at the line that caused the
- error. In many cases a plain RESUME will cause the program to enter an
- endless loop, because the error will keep happening repeatedly.
- In this case, the file will not exist no matter how many times BASIC
- tries to find it. Therefore, a plain RESUME is not appropriate following
- a "File not found" or similar error. Had the error been "Disk not ready",
- you could prompt the user to check the drive and then press a key to try
- again. In that case, then, RESUME would make sense. Although BASIC's ON
- ERROR can be useful, it does have a number of inherent limitations.
- Perhaps the worst problem with ON ERROR is that it often increases the
- program's size. When you use RESUME NEXT, you must also use the /x compile
- switch. Unfortunately, /x adds internal address labels to show where each
- statement begins, so the RESUME statement can find the line that caused the
- error. These labels are included within the compiled code and therefore
- increases its size.
- Another problem with ON ERROR is that it can hide what is really
- happening in a program. I recommend strongly that you REM out all ON ERROR
- statements while working in the QuickBASIC editing environment. Otherwise,
- an Illegal function call or other error may cause QuickBASIC to go to your
- error handler, and that handler might ignore it if the error is not one you
- were expecting and testing for. If that happens and your program uses
- RESUME NEXT, you might never even know that an error occurred!
- Yet another problem with ON ERROR is that it's frankly a clumsy way to
- program. Most languages let you test for the success or failure of the
- most recent operation, and act on or ignore the results at your discretion.
- Pascal, for example, uses the IOResult function to indicate if an error
- occurred during the last input or output operation.
- Finally, BASIC generates errors for many otherwise proper circumstances,
- such as the FILES statement above. You might think that if no files were
- found that matched the .XYZ extension given, then BASIC would simply not
- display anything. Indeed, an important part of toolbox products such as
- Crescent Software's QuickPak Professional are the routines that replace
- BASIC's file handling statements. By providing replacement routines that
- let you test for errors without an explicit ON ERROR statement, an add-on
- library can help to improve the organization of your programs.
- As I mentioned earlier, some errors can be avoided by using CALL
- Interrupt to access DOS directly. (One important DOS service lets you see
- if a file exists before attempting to open it.) But critical errors such
- as those caused by an open drive door require assembly language. In
- Chapter 12 you will learn how to bypass BASIC and access DOS directly using
- CALL Interrupt.
-
-
- EVENT HANDLING
-
- BASIC includes several forms of event handling, and like ON ERROR, these
- too are avoided when possible by many professional programmers. Event
- handling lets your programs perform a GOSUB automatically and without any
- action on your part, based on one or more conditions. Some of the more
- commonly used event statements are ON KEY, ON TIMER, and ON COM. With ON
- KEY, you can specify that a particular key or combination of keys will
- temporarily halt the program, and branch to a GOSUB routine designated as
- the ON KEY handler. ON TIMER is similar, except it performs a GOSUB at
- regular intervals based on BASIC's TIMER function. Likewise, ON COM
- performs a GOSUB whenever a character is received at the specified
- communications port.
- The concept of event handling is very powerful indeed. For example, ON
- COM allows your program to go about its business, and also handle
- characters as they arrive at the communications port. ON TIMER lets you
- simulate a crude form of multi-tasking, where control is transferred to a
- separate subroutine at one second intervals. Unfortunately, BASIC's event
- handling is not truly interrupt driven, and the resulting code to implement
- it adds considerably to a program's size.
- When any of the event handling methods are used, BASIC calls an interval
- event dispatcher periodically in your program. These calls add five bytes
- apiece, and one is added at either every statement, or at every labeled
- statement [depending on whether you compiled using /v or /w respectively].
- This can increase your program's size considerably. Even worse, the
- repeated calls have an adverse effect on the speed of most programs. Like
- ON ERROR, BASIC's event handling statements provide a simple solution that
- is effective in many programming situations. And also like ON ERROR, they
- are best avoided in important programming projects.
- Using purely BASIC techniques, the only alternative to event trapping
- is polling. Polling simply means that your program manually checks for
- events, instead of letting BASIC do it automatically. The primary
- advantage of polling is that you can control when and where this checking
- occurs. The disadvantage is that it requires more effort by you.
- To see if any characters have been received from a communications port
- but are still waiting to be read you would use the LOF function. And to
- see if a given amount of time has elapsed you must query the TIMER function
- periodically. If true interrupt driven event handling were available in
- BASIC, that would clearly be preferable to either of the two available
- methods. However, only with Crescent's P.D.Q. product can such capability
- be added to a BASIC program.
-
-
- PROGRAMMING STYLE
-
- Programming style is a personal issue, and every programmer develops his
- or her own particular methods over time. Some aspects of programming style
- have little or no impact on the quality of the final result. For example,
- the number of columns you indent a FOR/NEXT loop will not affect how
- quickly a sort routine operates. But there are style factors that can help
- or harm your programs. One is that clearly commenting your code will help
- you to understand and improve it later. Another is when more than one
- programmer is working on a large project simultaneously. If neither
- programmer can figure out what the other is doing, the program's quality
- will no doubt suffer.
- Clearly, no one can or even should try to force a particular style or
- ideology upon you. However, I would like to share some of the decisions
- that I have made over the years, and explain why they make sense to me.
- Of course, you are free to use or not use these opinions as you see fit.
- Programmers are as unique and varied as any other discipline, and no one
- set of rules could possibly serve everyone equally. Whatever conventions
- you settle upon, be consistent above all else.
- The most important convention that I follow is to use DEFINT A-Z as the
- first statement in every program. For me, using integers verges on
- religion, and my fingers could type DEFINT even if I were asleep. As I
- have stated repeatedly, integers should be used whenever possible, unless
- you have a compelling reason not to. Integers are much faster and smaller
- than any other variable type BASIC offers. Nearly all of the available
- third party add-on products use integers parameters wherever possible, and
- so should the routines you write. The only reasonable exception to this
- is when writing financial or scientific programs, or other math-intensive
- applications.
- Equally important is adding sufficient and appropriate comments. Some
- programmers like to use comment headers that identify each related block
- of code; others prefer to comment every line. I recommend doing both,
- especially if other people will be reading your programs. I also prefer
- using an apostrophe as a comment delimiter, rather than the more formal
- REM. There are only so many columns available for each comment line, and
- it seems a shame to waste the space REM requires.
- When writing a subprogram or function that you plan to use again in
- other projects, include a complete heading comment that shows the purpose
- of the routine and the parameters it expects. If each parameter is listed
- neatly at the beginning of the file, you can create a hardcopy index of
- routines by printing that section of each file.
- Avoid comments that are obvious or redundant, such as this:
-
- Count = Count + 1 'increment Count
-
- If Count is keeping track of the number of lines read from a file, a more
- appropriate comment would be 'show that another line was read. Also avoid
- comments that are too cute or flip. Simply state clearly what is happening
- so you will know what you had in mind when you come back to the program
- next month or next year.
- Selecting meaningful variable names is equally valuable in the overall
- design of a program. If you are keeping track of the current line in a
- file, use a variable name such as CurLine. Although BASIC in some cases
- lets you use a reserved word as a variable name, I recommend against that.
- Over the years, different versions of BASIC have allowed or disallowed
- different keywords for variables. While QuickBASIC 4.5 lets you use Name$
- as a variable, there is no guarantee that the next version will. Also, be
- aware that variables names which begin with the letters Fn are illegal,
- because BASIC reserves that for user-defined functions. Using the variable
- FName$ to hold a file name may look legal, but it isn't.
- Don't be ashamed to use GOTO when it is appropriate. There are many
- places where GOTO is the most direct way to accomplish something. As I
- showed earlier in this chapter, GOTO when used correctly can sometimes
- produce smaller and faster code than any other method.
- Use line labels instead of line numbers. The statement GOSUB 1020
- doesn't provide any indication as to what happens at line 1020. GOSUB
- OpenFile, on the other hand, reads like plain English. The only exception
- to this is when you are debugging a program that crashes with the message
- "Illegal function call at line no line number". In that case, you should
- *add* line numbers to your program and run it again. A program that reads
- a source file and prints each line to another file with sequential numbers
- is trivial to write. I will also discuss debugging in depth in Chapter 4.
- Even though using DEFINT is supposed to force all subsequent CONST, DEF
- FN, and FUNCTION declarations to be integer, a bug in QuickBASIC causes
- untyped names to occasionally assume the single precision default.
- Therefore, I always use an explicit percent sign (%) to establish each
- function's type. In fact, I use whatever type identifier is appropriate
- for functions and CONST statements, to make them easily distinguishable in
- the program listing. For example, in the statement IF CurRow > MaxRows%
- THEN CurRow = MaxRows%, I know that MaxRows% has been defined as a
- constant. Some people prefer to use all upper-case letters for constants,
- though I prefer to reserve upper case for BASIC keywords.
- Although BASIC supports the optional AS INTEGER and AS SINGLE directives
- when defining a subprogram or function, that wastes a lot of screen space.
- I greatly prefer using the variable type identifiers. That is, I will use
- SUB MySub(A%, B!) rather than SUB MySub(A AS INTEGER, B AS SINGLE). The
- same information is conveyed but with a lot less effort and screen clutter.
- A well-behaved subroutine will restore the PC to the state it was when
- called. If you have subprogram that prints a string centered on the bottom
- line of the screen, use CSRLIN and POS(0) to read the current cursor
- location before you change it. Then restore the cursor before you exit.
- I like to indent two spaces within FOR/NEXT and IF/THEN blocks.
- Although some people prefer indenting four or even eight columns for each
- level, that can quickly get out of hand when the blocks are deeply nested.
- Nothing is harder to read than code that extends beyond the edge of the
- screen. But whatever you do, please *do not* change the tab stop settings
- in the QuickBASIC editor, unless you are the only one who will ever have
- to look at your code. Even though the program may look fine on your
- screen, the indentation will be completely wrong on everyone else's PC.
- When creating a dynamic array I prefer REDIM to a previous '$DYNAMIC
- statement. REDIM is clearer because it shows at the point in the source
- where the array is dimensioned that this is a dynamic array. Otherwise you
- have to scan backwards through your source code looking for the most recent
- '$DYNAMIC or '$STATIC, to see what type of array it really is. By the same
- token, using ever-changing DEFtype statements throughout your code is poor
- practice. Further, if a variable is a string, always include the dollar
- sign ($) suffix when you reference it. If you use DEFSTR S or even worse,
- DIM xxx AS STRING and then omit the dollar sign, nobody else will
- understand your program.
- I also prefer to explicitly dimension all arrays, and not let BC create
- them with the 11-element default (including element zero). If you need
- less than 11 elements, the memory is wasted. And if you need more, then
- your program will behave unpredictably. Not dimensioning every array is
- sloppy programming. Period.
- Avoid repeated calls to BASIC's internal functions if possible. In the
- listing below, the first example creates 61 bytes of code, while the second
- generates only 46 bytes.
-
-
- Not recommended:
-
- IF CSRLIN = 1 OR CSRLIN = 6 OR CSRLIN = 12 THEN
- ...
- END IF
-
-
- Much better:
-
- Temp = CSRLIN
- IF Temp = 1 OR Temp = 6 OR Temp = 12 THEN
- ...
- END IF
-
-
- As I stated earlier in this chapter, using SELECT CASE instead of IF will
- also eliminate this problem. Many BASIC statements are translated into
- calls, and each call takes a minimum of five bytes.
- Your programs will be easier to read if you evaluate temporary
- expressions separately. Even though BASIC lets you nest parentheses to
- nearly any level, nothing is gained by packing many expressions into a
- single statement. In the examples below that strip the extension from a
- file name, the first creates only a few bytes less code. Although this may
- seem counter to the other advice I have given, a slight code increase is
- often more than offset by a commensurate improvement in clarity.
-
-
- File$ = LEFT$(File$, INSTR(File$, ".") - 1)
-
-
- Dot = INSTR(File$, ".")
- File$ = LEFT$(File$, Dot - 1)
-
-
- The last issue I want to discuss is how to pronounce BASIC keywords and
- variable names. Don't laugh, but many programmers have no idea how to
- communicate the words LEFT$ or VARSEG over the telephone. Some people say
- "X dollar" for X$ even though "X string" is so much easier to say. Another
- keyword that's hard to verbalize is VARPTR. I prefer "var pointer" since
- it is, after all, a pointer function. CHR$(13) is pronounced "character
- string thirteen", again because that's the clearest and most straight
- forward interpretation. Likewise, INSTR is pronounced "in string" and
- LEFT$ would be said as "left string". If you're not sure how to pronounce
- something, use the closest equivalent English wording you can think of.
-
-
- SUMMARY
-
- In this chapter you have learned how BASIC's control flow statements are
- constructed, and how the compiler-generated code is similar regardless of
- which statements are used. You also learned where GOSUB and GOTO should
- be used, and when subprograms and functions are more appropriate. The
- discussion on logical operations showed how AND, OR, EQV, and XOR operate,
- and how they can be used to advantage in your programs.
- I have explained in detail exactly what recursion is, and how recursive
- subroutines can perform services that are not possible using any other
- technique. You have also learned about the importance of the stack in
- recursive and other non-static subroutines. Passing parameters to
- subprograms and functions has also been described in detail, along with
- some of the principles of modular program and event handling.
- Finally, I have shared with you some of my own personal preferences
- regarding programming style, and when and how such conventions can make a
- difference. Although this is a personal issue, I firmly believe it is
- important to develop a consistent style and stick with it.
- In Chapter 4 you will learn debugging methods using both the QuickBASIC
- editing environment and Microsoft's CodeView debugger. The successful
- design of a program is but one part of its development. Once it has been
- written, it must also be made to work correctly and reliably. As you will
- learn, there are many techniques that can be used to identify and correct
- common programming errors.